Robert Krueger, VIS, University of Stuttgart, robert.krueger@vis.uni-stuttgart.de PRIMARY
Michael Steptoe, VADER Lab, Arizona State University, msteptoe@mainex1.su.edu
Rolando Garcia, VADER Lab, Arizona State University, rsgarci1@asu.edu
Sagarika Kadambi, VADER Lab, Arizona State University,
skadambi@asu.edu
Thomas Ertl, VIS, University of Stuttgart, Thomas.ertl@vis.uni-stuttgart.de
Ross Maciejewski, VADER Lab, Arizona State
University, rmacieje@asu.edu
Student Team: YES
Did you use data from both
mini-challenges?
We did both
mini/challenges, but for this solution (mc1) we only consider data from mc1.
We developed our own tool, based on Java (Backend) and d3 (Frontend).
In the beginning, we took a quick look at the data with Tableau.
Approximately
how many hours were spent working on this submission in total
~250 hours over one month between all the students
May we post
your submission in the Visual Analytics Benchmark Repository after VAST
Challenge 2015 is complete? YES
Video Download
Video:
------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC1.1 – Characterize the attendance at DinoFun World
on this weekend. Describe up to twelve different types of groups at the park on
this weekend.
a.
How big is this type
of group?
b.
Where does this type
of group like to go in the park?
c.
How common is this
type of group?
d.
What are your other
observations about this type of group?
e.
What can you infer
about this type of group?
f.
If you were to make
one improvement to the park to better meet this group’s needs, what would it
be?
Limit your response to no
more than 12 images and 1000 words.
Our approach to this was to first
process the data to create patron trajectories defined by attraction check-in
information around the park (71 locations including entrances) and then cluster
patrons who had similar trajectories (meaning that they traveled around the
park together). A large majority of
attractions never posted check-in information, but one can infer check-in/out
time based on how long a person lingered in the general area of an
attraction. We created inferred
check-ins as follows:
·
If a
person’s location remains within a distance, d, from attraction, a, for more
than a temporal threshold, t, then we consider this to be an inferred check-in
at a. We used d=5 pixels, t= 5 minutes
(analysts can vary these parameters).
A spatiotemporal trajectory
structure capturing check-ins was created.
Our observation is that if users visit attractions in the same order at
the same time, then they are likely traveling the park together. We aggregate the check-ins to five minute
intervals (user adjustable). A
spatiotemporal trajectory then consists of the location that a user last
checked into. For example, if the user
checks in at the park gate (attraction 84) and then took fifteen minutes to go
to the Flying TyrAndrienkos(12), stayed there for 30
minutes and then went to Tricerastop(52) for 10
minutes, their trajectory would be:
·
84-84-84-12-12-12-12-12-12-52-52-….
For a full day at the park
8AM-12AM, we have a trajectory of length 192.
We combine all three days together for our trajectory and a
“not-in-park” value is stored when the user is out of the park. The trajectory structure can be adjusted to
represent park regions (five park regions) and attraction categories (thrill
ride, etc. – 10 total).
Our interface (above) consists
of:
1)
Visual
queries, clustering and outliers
2)
Calendar
view
3)
Sequence
View - temporal check-in sequences per visitor or the most representative
check-in sequence of a group
4)
Map View
- trajectories, heat-maps, animation
The main feature used for group
exploration was the “Detect Group” option. This applies agglomerative
hierarchical clustering using pre-computed Levenshtein
distance matrixes for the trajectory sequences. Sequence comparison is a string
comparison (e.g. ‘84’ from 8.55 to 9.00AM is not similar to '12' from 8.55 to 9.00AM).
Normalization with the square root of the maximal length of the compared
sequences is applied. A complete linkage strategy was used for final clustering
and groups can be extracted via the tolerance control widget (values of 0-7
seemed best).
Once the clustering is done,
patrons now belong to local-groups (i.e., patrons who travel the park
together). These local-groups can range
in size from 2-44 visitors. We detected ~2300 such local-groups and an average movement
sequence based on the maximal occurrence of a visit at the current time slice
is computed for visualization purposes.
To find group types (groups that
have similar interests within the park but do not necessarily travel together)
we create a feature vector for each group (number of thrill rides visited,
etc.) and cluster each local-group using k-means. This gives us group types. We also support a view to explore feature
vectors of individuals, local-groups and group types (below).
Groups identified include:
1. Tours
a)
33-44
patrons per local-group
b)
Visits
lots of the park but few kid rides.
c)
~32
local-groups (out of 2300)
d)
Beer
garden visit between nearly every ride.
e)
Likely comes
via bus, little interest in the shows and pavilions.
f)
Volume
ticket purchase discount.
2. Smaller
adult groups
a)
2-11
patrons per local-group
b)
Mostly do
thrill rides.
c)
~100
local-groups (out of 2300)
d)
Do not
use parks overnight accomodations.
e)
Groups
are probably private arrangements of friends/colleagues.
f)
Adding new
rollercoasters could increase return visits.
3. Stage
group
a)
8 patrons
per local-group
b)
Goes to
the stage, for every show.
c)
1
local-group (out of 2300)
d)
They
don’t check in and arrive ~5 minutes before the shows start.
e)
Probably
the soccer star’s staff or security for the show.
f)
For star
visits, a private entrance nearer the stage could be important.
4. Half-Day
patrons
a)
2-7 patrons per local-group
b)
They like short visits.
c)
~270 local-groups (out of 2300)
d)
They
either come in the early morning or after lunch.
e)
They are
mainly in the park for a few main attractions.
f)
The park
owners should offer half day admission fees.
The screenshot shows heat maps of different groups
within this group types as small multiples. Visitors within these groups go to
very similar rides.
5. Foodies
a)
2-11
patrons per local-group
b)
They like
food (2 to 5 hours at restaurants)
c)
~250
local-groups (out of 2300)
d)
This
group stays at food places 2-5 hours.
e)
This
group is only active in the morning.
f)
Offer an
eating pass that includes samples at various park restaurants.
6. Shoppers
a)
2-11
patrons per local-group.
b)
Finish
their day with several hours of shopping.
c)
~500
local-groups (out of 2300)
d)
Always
buy lunch at park too.
e)
Big
spenders at the park.
f)
Offer a
buy x get y free deal.
(1) Typical shopping group (shops mainly before
leaving the park, with short breaks). (2) By filtering for groups that shop
often we can compare their features in a small multiple view (3).
7. Nappers
a)
~1-6
patrons per local-group.
b)
Leaves
the park near lunchtime
c)
~50
local-groups (out of 2300)
d)
May be
cheaper group
e)
Leaves
the park either for different/cheaper food, to take a rest.
f)
Offer cheaper
lunch options.
8. The
Non-Check-In Group
a)
~1 patron
per local-group
b)
Either
their tracking devices are erroneous, or they are not normal visitors (i.e.
staff).
c)
~70
local-groups (out of 2300)
d)
When we
use our method to infer check-ins these sequences are quite short and contain
few rides.
e)
Technologically
challenged.
f)
Devices
should be more stable.
MC1.2 – Are there notable differences in the patterns of activity on in the park
across the three days? Please describe
the notable difference you see.
Limit your response to no more than 3 images and 300 words.
For exploring patterns of activity in the park we employed a traditional
calendar view as well as explored how a new “probability view” could explore
where and how patrons travel around the park.
The calendar view shows the number of patrons checked-in to a ride
aggregated at half-hour intervals. You
can view the counts for Friday, Saturday, Sunday, and all three days in the
“Any Day” view. The “Every Day” view
shows ids that were at the same location at the same time “Every Day” but was
not used here. The following image shows
daily routines and anomalies from the calendar view.
1) Entries/exits
are most busy during opening/closing hours.
2) Restaurants
are empty until noon.
3) Shops are
most busy in the evening hours.
4) There are
two shows a day at the Grinosaurus stage. However, on
Sunday, there is only one show in the morning. This might be related to the
vandalism and stolen medals. In general,
the pavilion is not visited during the shows (probably closed) but busy all other
times. On Sunday the pavilion is also empty in the afternoon. We hypothesize
that during the first show on Sunday the medals get stolen and there is
vandalism in the pavilion. Then the pavilion is shut down for the rest of the
day.
5) Several
attractions breakdown during the weekend, for example TryAndrienkos
is empty on Sunday for 30 minutes.
6) All
movements/check-ins after 8:30PM on Friday are missing.
The probability view shows the most likely attraction a visitor will
attend next. The above figure shows
slight difference between the three days.
We used this to try and see if there was an obvious Markov Model that
fit the data, but data showed that visitors were most likely to always go to
thrill rides next.
MC1.3 – What anomalies or unusual patterns do you see? Describe no more than 10
anomalies, and prioritize those unusual patterns that you think are most likely
to be relevant to the crime.
Limit your response to no more than 10 images and 500 words.
In order to explore anomalies and unusual patterns of movement related
to the crime, we have developed a visual query interface based on Boolean
operations that can find patrons that visit specific locations at specific
times of the day.
To do this, first a Boolean
operation can be selected (e.g. or) (1). Using this operator the analyst can
select time and location cells from the calendar view. These selections will constrain the query
(2). In this example the user queries for all users that never went to the
soccer star’s stage performance. This is done in in
Any Day view. Finally this query can be executed (3) and will result in a
number of trajectories that can be investigated in the sequence view.
Using the visual query tool, we
for example can query for all users that went to Creighton Pavilion during the
show at the Grinosaurus stage, when it was supposed
to be closed (1,2).
While most of the sequences are
identical (indicating that this is probably a group that visit the park
together) there are few sequences that are different (2, in the red oval). We
also note that without our approach to infer visits there is just a single
check-in in the park around 9:30AM when the Pavillion
should be closed (3, left) which means that all others found (43 visitors) were
there but did not check-in. They might be staff or could have entered the
building without permission. Querying for similar sequences to the suspicious
person reveals two nearly identical sequences (3, right – dashed rectangle).
The first one, however, has an additional check-in at 9:30 am.
On Sunday another
visitor (id: 1983765 ) first goes to
Pavilion, then spends 2 hours at the Scholtz express
(attraction 20), and exits park at 11:45 AM. Crime is discovered between 11:30
and 12:00. Could he have committed the crime between 9:00 and 9:10? The
pavilion is closed after 9:00 but by 9:30. Security possibly failed to notice
the crime until re-opening.
Along with the
visual query tool, we have also created a method for detecting outliers in the
trajectory sequences. For the outlier
detection we again use this distance matrix. Here we simply sum up all
distances for each sequence and order them descendingly. The sequences which are the farthest from any
other sequence can be considered as outliers.
The analyst may query for the n-most anomalous sequences. Outlier
detections (1) shows most different sequences (compared to all others). Here we
find sequences (2) that are very short, only contain a park entry but no
check-ins (especially when we do not consider inferred check-ins) and sequences
without any ride or thrill ride visit.